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Abstract 

We analyse the wavelet shrinkage algorithm of Donoho and Johnstone in order to 
assess the quality of the reconstruction of a signal obtained from noisy samples. We prove 
deviation bounds for the maximum of the squares of the error, and for the average of the 
squares of the error, under the assumption that the signal comes from a Holder class, 
and the noise samples are independent, of mean, and bounded. Our main technique 
is Talgrand's isoperimetric theorem. Our bounds refine the known expectations for the 
average of the squares of the error. 
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1 Introduction 



We address the classical problem of the reconstruction of signal samples from noisy samples. 
We consider an original signal of bounded duration f:t& [0, 1] f{t) G M. We also have 
additive noise e: [0, 1] — >■ M. Thus, the observed noisy signal at time t is y{t) = f{t) + e(t). 

We sample the noisy signal at n uniformly spaced instants and we denote the sample values 
by Vi = fi + €-i = /(^) + e(^) (for 1 < i < n). Our goal is to recover a good approximation of 
the original signal samples (/i, . . . , /„) from the noisy signal samples {yi, . . . , yn)- For this to 
be possible we need some assumptions that distinguish the signal from the noise: 

• The original signal / has a certain degree of "smoothness", i.e., / belongs to a Holder class 
A"(M) for some a > and M > 0. 

• The noise is "random", i.e., (ei, . . . , e„) consists of n independent Borel random variables. 
The Holder classes are defined as follows: 

ForO<a<l, A"(M) = {/i e M[°'^1 : (Vxi, G [0, 1]), - /i(x2)| < M|xi - Xal"}. 

For 1 < a, A"(M) = {h e M[°'^1 : (Vx G [0, 1]) \h'{x)\ < M, /iL"J exists, and 

(Vxi,a;2 G [0,1]) |/iL"J(a;^) - /^^(a;^)! < M|xi - Xal^-^^J }. 

Let (^1, . . . , yn) be an approximation of (/i, . . . , /„), obtained from (yi, . . . , Most com- 
monly, the closeness of this approximation is measured by ^ J2^=i{yi— fi)"^ or by the expectation 
-^[n Yli=iiyi ~ fi)'^] (which makes sense since the e,, and hence the iji, are random variables). 

The wavelet shrinkage algorithm of Donoho and Johnstone is a very efficient tool 
for finding good estimates y. In outline, the algorithm works as follows: 

(Step 0) Choose a wavelet system with vanishing moments (A^ > a); choose a level of 
coarseness Jq > {Jq will depend on a), and consider the multi-resolution chain of Hilbert 
spaces C Vj^+i C . . . C Vj C . . . . 

(Step 1) Apply the Discrete Wavelet Transform (DWT) to the noisy signal samples (yi, . . . , ?/„), 

where n > 2"^°. This yields the "empirical wavelet coefficients" (.^i, . . . ,C,n)- 

(Step 2) Fix a "threshold" A„ (> 0) and apply either "hard" or "soft thresholding" to 

(ei,---,en)- 

Hard thresholding consists of replacing each by when < A„, and keeping C,i unchanged 
when l^il > A„. 

Soft thresholding consists of transforming each as follows: C,i is replaced by if < A„; 
if > A„, is replaced by - A„; if < -A„, is replaced by + A„. 
(Step 3) Apply the inverse DWT to the result of (2). This yields the estimate (yi, . . . ,yn)- 

To what extent does wavelet shrinkage depend on the smoothness conditions of the signal 
/ and on the randomness conditions of the noise samples Cj, and how do the estimators yi 
approximate the original signal /? In ||^, [j^ it was assumed that the Cj are iid Gaussian variables 

with distribution A^(0, cr^), and the threshold was chosen to be A„ = (J\j2^^. Assuming that 

/ G A"(M) (the H61der class) with a > 0, it is proved in §, [§ that E[i ELil^* - hf] < 
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C • (- log n) i+2« , where C depends only on M and on the wavelet system used. It was observed 
in 0, 1^ (the proofs are due to Lepskii ^ and to Brown and Low 0) that this upper bound 
is optimal over all possible algorithms, if the parameters a and M are not known. For the 
optimality of the wavelet shrinkage algorithm it is important that the threshold be of the form 

c ■ -y/^^f^ (where c does not depend on n). 

Since the publication of , there has been further progress on wavelet shrinkage (chapter 
6 of |]13| is an excellent reference up to 1999). Most recently, Averkamp and Houdre 
expanded the scope of wavelet shrinkage by allowing the noise samples Cj to have different 
distributions Fj, chosen from a wide class of distributions. They show in ||l|] (page 32) that the 
error expectation of the wavelet shrinkage algorithm for bounded noise is roughly the same as 
for Gaussian noise, if the parameters a and M of the Holder class of the signal are not known. 
They also discuss various choices of thresholds. 

All the results on wavelet shrinkage in the literature so far evaluate the quality of the 
approximation by bounding the expectation E[^ Sr=i(^« ~ fiY]i to the best of our knowledge. 
In this paper we study deviation bounds (rather than just the expectation) of ^ X]r=i(^« ~ /«)^ 
and of max{(yj — /j)^ : 1 < « < n}. 

Assumptions: We assume that the signal / belongs to a Holder class A"(M), and that the 
noise samples Cj are independent Tandom variables (with possibly different distributions). The 
only restrictions on the distributions are that they are Borel measurable, have compact support 
(contained in an interval [— |, |]), and zero mean. The assumption that the distributions of the 
noise have bounded support is of course equivalent to assuming that the noise Cj has bounded 
values (|ej| < |). 

The main results of this paper are the following deviation bounds. 
Theorem. For the wavelet shrinkage algorithm with threshold 



A„,5 = C^6(l + 2^{l + 6) ln2)^^ 

(where depends only on the wavelet system) we have the following deviation bounds: 
There are ci, C2 > 0, depending only on h, M , and a, such that for all n > hq and all 6 > 0, 



n / 



As a consequence, 

'logn 



> 1 



The minimum number of samples, hq, is 2^ when < a < 1; when a > 1, 
no= (4a + 2)2°+2. (log2(4a + 2))2. 

One notices that Uq grows very rapidly with a, when a > 1. For a = 2, we have hq = 1.1*10^; 
for a = 3, uq = 3.7 * 10^°, which is impractical. So for large a our theorem is interesting only 
from an asymptotic point of view. On the other hand, in practice usually a < 1. 
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2 Preliminaries 



2.1 Wavelets 

We will usually follow the notation of regarding wavelets, the only exception being that 
we reverse the multi-resolution indices. Moreover, we only consider real-valued functions with 
domain [0, 1]. So we have a sequence of real Hilbert spaces Vjq C Vj^+i C . . . C Vj C . . . , such 
that the closure of [JjVj is [0,1]. We let Vj^i = VjQ)Wj (orthogonal complement). Since 
we are in the case of compactly supported functions each Vj is a finite-dimensional real vector 
space (of dimension 2^), with orthonormal basis {<fj,k : < k < 2^ — 1}, derived from a scaling 
function ip. Let be the wavelet function corresponding to ip, and let {ipj^k : < k < 2^ — 1} 
be the corresponding orthonormal basis of Wj. 

For any function g G L^[0, 1] we define the piece- wise constant function 'g: [0, 1] — M as 
follows: g{x) = g{^) {= gk) if ^ < x < J for some /c = 1, . . . ,n; g{x) = if x ^ ]0, 1]. 
The discrete wavelet transform of a vector ((?!,... , (?„) can be obtained by taking the wavelet 
coefficients of the piecewise constant function 'g. These wavelet coefficients are: 

c% = {g, ^j,k) = Jo9{x) ^j,k{^) dx, and 

45 = ^hk) = Jo g{x) iJj,k{x) dx. 
Then for any integer J > Jq\ 

In this paper we will use two wavelet systems: The Haar wavelets (because of their simplic- 
ity, especially for programming purposes), and the interval wavelets with predefined vanishing 
moments, based on Daubechies wavelets (Cohen, Daubechies, Jawerth, Vial 0). 

For the Haar wavelets, the scaling function is if{x) = 1 when < x < 1, and if{x) = 
otherwise. Hence, Lpj^k{x) = 2-'/^ when k2~^ < x < {k + 1)2"-^, and Lpj^k{x) = otherwise. 
The Haar wavelet function is ip{x) = 1 if < x < |, ip{x) = — 1 if | < x < 1, and 
^(x) = otherwise. Hence, V^j-fc(x) = 2^/^ if k2-^ < x < {k + l)2'\ i^j,k{x) = -2^/^ jf 
{k + 1)2-^' < X < (A; + 1)2"^ and Vj,fc(x) = otherwise. 

For the interval wavelet system of 0] , with N vanishing moments, the scaling function ip and 
the wavelet function ijj are complicated. But all we need to know about them is the following: 

• A multiresolution of L^[0, 1] is obtained, with an orthonormal basis for Vj when j > Jq: 

Wj,k ■.l<k<2^ -2N}U {y^gt, ipff ■.0<t<N}. 
Each (fij^k has support [k2^^ , {2N — 1 + k)2~'j], each (/j^^^f has support [0,'j2^-'], and each ^p^jf"^ 
has support [1 — i2~^ , 1]. 

The decomposition level Jo is chosen so that Jo > 1 + log2(2A^ — !)• For signals in the 
Holder class A"(M) we require the number of vanishing moments to be > a. 

• We also have an orthonormal basis for Wj, 

{i^j,k ■.l<k<2^ -2N}yj i^ff' : < z < AT} 

with the same supports as the corresponding ip functions. 
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• and il) are bounded on [0, 1] by a constant C > 0, independent of x and N: Vx G [0, 1], 

\^{x)\M{^)\<c. 

For < A; < 2^' - 2iV ("inside the the interval"), <^j,fc(x) = 2J/V(2-'a; - k). 
At the ends of the interval [0, 1] we have for < i < A^, (see 0) 

^i^) = EJ=r'(-/^)v(2%+/^). 

A similar formula holds on the right end of the interval [0, 1]. 

Assuming that n is a power of 2, n = 2"', we have for the function y, relative to any wavelet 

2'^ — 1 

system: y{x) = X]fc=o iV^ ^J,k)fJ,k{x). Thus for any Ji with < Ji < J, the DWT transforms 

( \ f ^f^i^y^ M ] The 

DWT is an orthogonal transformation (represented by an orthogonal matrix W). 

We will always assume that n is a power of 2: n = 2"^. Throughout this paper, log will 
refer to log2, and In will denote the natural logarithm. 

Let us now return to the analysis of a noisy signal y{t) = f{t) + e{t). 

Lemma 2.1 With respect to the Haar wavelets, the wavelet coefficients of the function e have 
the following properties: 

(HI ) For all j G [0, 2^] and all k G [0, 2^'^ - 1] : 

_ 0-J+J72 „ , , , 

hk ~ ^ ej+i+fc2-'-J 

i=0 

(112) For all j and k as in (HI): 

i=0 

For any function / : [0, 1] ^ M belonging to hS"\M) with Q < a < 1 we have: 
(H3) For all j G [0, 2-^] and all k G [0, 2^-^ - 1] : 

l^fil < M2--''(3+"). 

The proof of this lemma is just a calculation and is given in the Appendix. 

Lemma 2.2 With respect to the interval wavelet system the wavelet coefficients of the 
function e have the following properties: 
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(Dl ) For all j G [0, 2-^] and all k G [0, 2^-1 - 1] ; 

/or some numbers Uij^k that do not depend on the noise function e. Moreover, \aij^k\ < for 
some constant C^>1 depending only on the wavelet system. 

(D2) For all j and k as in (Dl): 

2-'-^-i 

dfl = 2"^+^/^ ^ Pi,j,kei+i+k2J-j 

for some numbers (3i,j,k that do not depend on the noise function e. Moreover, \(3ij^k\ < Cip 
where C<^ > 1 depends only on the wavelet system. 

Suppose / : [0, 1] — s> M belongs to A'^")(M) with I < a, and suppose the number of vanishing 
moments N of the wavelet system satisfies N > a. Then we have: 

(D3) For all j E [0, 2^] and all k G [0, 2^-'^ - 1]: 

< C^A^2-^'(^+°) 
where > 1 depends only on the wavelet system. 

The proof of Lemma 2.2 is just a calculation and is given in the Appendix. 



2.2 Talagrand's isoperimetric theorems 



Talagrand's isoperimetric theorems, published in 1995 [|T2], have had a profound impact on the 
probabilistic analysis of combinatorial optimization methods; Talagrand's theorems often apply 
quite directly, giving shorter proofs, often with dramatically better results than previously used 
methods (see chapter 6). We will use the following result of |jl2 . 



Let {i = 1, . . . ,n) be Borel probability spaces, and let fi" be the product space 

with product measure P = /ii x . . . x For ACQ"- and u = {ui, . . . , Un) G fi", Talagrand's 
'convex' distance is defined by 

{( n ^ n 

inf I Yll3i-I{uji ^ Oi) : (ai,...,a„) G a \ : . . . , G M",^/^,^ = 1 

Notation: /(cjj 7^ Oj) = 1 if Ui 7^ a,, and /(tUj 7^ a^) = otherwise. 
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Theorem 2.3 (Talagmnd, Theorem 4. 1.1 in ^J): For any A C with P{A) > 0: 



As a corollary, for allt>0, 



PidTiu,A)>t) < -^.exp(-^). 

3 Deviation bound for ^ YJt=iUt - Vtf 

Recall that the input for wavelet shrinkage is . . . , where Vi = fi + Ci {i = 1, . . . ,n), the 
fi are samples from the original signal /, and the e, are additive noise. The e, are independent 
Borel random variables. We assume that the noise is bounded (with \ei\ < |), so each random 
variable Cj is a Borel measurable function ef. cjj G eii^Ji) E [— Accordingly, we 

view (ei, . . . , e„) as a function u = (tui, . . . , Un) G fi" 1-^ e{uj) = (ei(co'i), . . . , e„(to'„)) G 
[— |, !]"■. (Borel measurability is assumed in order to apply Talagrand's theorem.) To simplify 
the notation we often write ei{u)) for ei{uji). 

We shall first define a subset A of fi" and then show that 

• P(y4) > I if n is large enough, and 

• wavelet shrinkage satisfies our deviation bounds when the noise samples are in A. 
Then for any 5 > we define a subset Bs C fi" such that 

• for any uj G fi", if Talagrand's distance satisfies dxioo, A) < 2a/(1 + 5) Inn then 00 G Bs; 

• wavelet shrinkage satisfies our deviation bounds when the noise samples are in Bg. 
Finally, by applying Talagrand's theorem we obtain our results. 

3.1 The subset A 

Recall that we assume n = 2^ . For any G fi" we decompose the noise sample sequence e{uj) 
into blocks of length J, as follows: 

e{u) = (..., . . . , ekj+i{uj), e(fc+i)j(a;), . . . , . . .) 
where k = 0, . . . , j2-^ — 1. Here, for simplicity we regard j2'^ = 2"^"'°^"^ as an integer (i.e., 
we assume that J is a power of 2). 

For the Haar wavelets we define the subset A C fi" as follows: 

A = {cj G fi" : (V£g [-1, J-logJ])(VA; G [0,2-'-'°g-^-^- 1]), 
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i=0 



For the interval wavelet system we define 



A 



{uj e 1]" : (V£ e [-1, J - log J]){Wk e [0, 2^-^°^^- 



1]), 



Y ek2ej+i+i{i^) ■ ai,j-iogj-i,k < 6J2^/V2-iln2 



j=0 



,72* -1 



and 



J2 efc20+i+i(^) • A,J-iogj-^,fc < 6J2^/V2-iln2 }. 



i=0 



We need a classical result from probability theory. 

Theorem 3.1 (Hoejf ding's inequality) Let Xi, . . . be independent random variables with 
bi < Xi <b2 (i = 1, . . . , m) . Then for all t > 0, 



Lemma 3.2 For all n > 1, P{A) > + ^ for the Haar wavelets, and P{A) > l-j^ + f 

for the interval wavelet system. 

In either case, ifn> 256 then P{A) > If n > 2^ then P{A) > i. Moreover, P{A) 
tends to 1 when n — > 00. 

Proof: We first give the proof for the Haar wavelets. For any I e [—1, J — log J] and k e 
[0, 2-^"^°^^-^ _ 1] noise samples ej.20+i) ■ ■ ■ i C(fe+i)2^j ^i"® independent random variables, each 
with values in [— |, |]. So Hoeffding's inequality applies, and since E[ei\ — for all i, we obtain 
for all i > 0, 





Letting t = 62^/W2-Mn2 we obtain 




(1) 
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For i e [-1, J - log J] and k G [0, 2-^-i°s-^-^ - 1], let 



A 



e,k 



i=0 



< 62^/W2-iln2 



and let Ai = HLo " ^ ^A^- 
Then by (H), P(A,,fc) > 1 - i. 

For the complements of these sets we have Ai = IJfc=o ^ ~^ ^^,k 

hence PiA,)<Etr~'-'i 
Since n = 2-^ we obtain P{A£) < 

Since A = f]'l.~^°^'^ Ai we have 

Hence, > 1 — -j-^ — h -. This proves the Lemma for the Haar case. 

' ^ ' — log ra n 

For the interval wavelet system we let 

A" = {cue : (V£ G [-1, J - log J])(VA; G [0,2-^-'°^^-^ _ i])^ 



1=0 



< 6J2^/V2-iln2 }, 



and 



= {cuGfi" : (V£ G [-1, J-log J])(VA; G [0,2 



J-log J- 



1]), 



J2'-l 



efc20+i+i(^^) ■ A,J-iog 



j=0 



< 6J2^/V2-iln2 }. 



Then A = A" n A^. 
We also let 



and 



Ai, = {coE 



i=0 



J2<-1 

6^20+4+1 (^^) ■ A, J-logJ-^,fe 

j=0 



< 6J2^/V2-iln2 }, 



< 6J2^/V2-iln2 }. 



Moreover, we let = -4^^ and = Hfc ^. Then = n , hence = U . 



By the same proof as for Haar wavelets above: P{A'^) and P{A^) < 
Hence, P(A^) < 2"'+' 



logn 



Since A = n/=-T^ obtain by a similar calculation as in the Haar case: 



p(A) > 1- 8 + 2. 



□ 



Lemma 3.3 For all cu & A, all j e ] Jo, J[, and all k e [0, 2^ — i], we have (for some constant 
> 1, depending only on the wavelet system): 



i4:<-..i<,c,/^ 



and for all k e [0, 2-^° - 1], 



logn 



n 



Proof: We consider two cases for j. 
Case 1: Jo < J < J - log J + 1. 

Wc write j as J — log J — i, where —l<i<J — log J — Jo- Let us first consider Haar wavelets. 
By (H2) (in Lemma 2.1) we have 



2^-1 J-1 



d 



\ i=0 i=0 



Since a; e ^4 we can apply the defining property of A to 

Y^J2*-i-l _ Y^J2^-i-l 

Since 2k is in the correct range [0, 2^~^^ — 2] = [0, i2'^~*^^~^'' ~ 2], we have 

>r^J2^-l-l 

Similarly, 



< 6J2(^-i)/V2-iln2. 



^J2<-1-1 _ ,r^J2^-l-l 

2^i=0 ^i+l+(fe+|)2V — Z^i=0 ej+i+(2fc+l)2«-lj 



< 6J2(^-i)/V2-iln2 ; 



we used the defining property of A, since the range of 2 /c + 1 is 

[0, 2^+1 - 2 + 1] = [0, i2-^-(^-i) - 1]. 
By combining these two bounds we obtain 

< 2--^+J/2.2.6J2(^-i)/V2-iln2 < b^A^^f^ < b^f^. 



Let us now consider case 1 for the interval wavelet system. By (D2) in Lemma 2.2, 
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Since uj & A, 

< 2-^+J/2.6J2(^-i)/V2-iln2 = 62(-^+i°g-^)/V2-i ln2 = 6y^V2-iln2 
Case 2: J - log J + 2 < j < J. 

For the Haar wavelets we use the boundedness of the noise, \ei — ej\ < b. Hence, by (H2), 

|^(eH)| <2-^+^V25(J2^-i-l) < 
For the interval wavelet system, (D2) yields 

< I 2-^/2 < 
by using j > J — log J + 2 for the last inequality. 

By an argument similar to the above we obtain the bound for {cf^^'^^^l- ^ 

To implement wavelet shrinkage we need two parameters: A decomposition level Jq and a 
threshold A„ ,5. We define 

and we choose Jq so that < Ji- 

For the Haar wavelets (when < a < 1) we can simply pick Jq = 0, but for the interval 
wavelet system (when 1 < a and we have = \a\ vanishing moments), we also require (see 
g]) that Jo > 1 + log(2A^ - 1). When a > 1 we choose 

Jo = 1 + riog(2 \a\ - 1)1 

Thus, for Jo to exist (when a > 1) we need n = 2*^ to be such that 1 + log(2[Q;] — 1) < Ji- A 
sufficient condition for this is that J — log J > (1 + log(2a + 1)) (1 + 2a), 
or equivalently, ^>{^a + 2)2"+i. 

By using the fact that 7-^ is an increasing function of n, and that the relation > a; is 

J o iQg n o 5 log y — 

implied by y > x ■ log x ■ log log x, we have the following sufficient condition on n: 
When a > 1 we assume that 

n > (4a + 2)2"+2 . (iog(4c, + 2))2 

We use the threshold 

Xn,5 = & (1 + 2^/(1 + 5) ln2) 
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The first step of the wavelet shrinkage algorithm is DWT, which maps {yi, . . . ,yn) to 
^/^(M M cfy'' d^y^i (I'^y^ d^-^^ ) where 77-2-^ 

Since yi — fi + and the DWT is linear we have 



Sy) ^ If) , 



and 



where c'j^ki '^^jl ^"^^ ^J^ki ^fl wavelet coefficients for (/i,...,/n) and (ei, ...,e„), 

respectively. 

The second step of wavelet shrinkage is thresholding. We shall prove our result for soft 
thresholding. But in our proofs it will be easy to see that our results will hold for hard 
thresholding too. For soft thresholding, we have 



J'j,k 



d% - Xn,S if 4^ > Xn,S 

if 14^21 < A„,, 



d% + Ks if <^ < -An 



The last step of wavelet shrinkage is the inverse of DWT which yields y 
we let 

2''o-l J-1 2^-1 

fe=0 j=Jo k=0 

then we obtain = y{~) ior i — 1, . . . ,n. 



{yi, 



,yn)- If 
(2) 



3.2 Application of Talagrand's theorem 

Let W be the orthogonal matrix that represents the DWT. Let A C be as above. For any 
5 > we define the following subset of Q": 



Bs ^ < co' e : (V^e[l,n]), inf 

u)€A 



1=1 



< 2b^/{l + S) Inn 



Lemma 3.4 For all u' e Bs and all k e [0, 2-^° - 1]: Ic^^'^^ | < Xn,s. 
For all j e [Jo, J-1] and k e [0, 2^ - 1]; \d%"'''^^ \ < Xn,s. 
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Proof: By the definition of Bs, for every uj' G Bs there exists u & A such that 



v^|cSr-cS»|<&2v/(l + 5)lnn 

and 



The Lemma then follows from Lemma |3.3| . □ 

For the following theorem we use the threshold Xn^s as above; we let Uq = 2^ when < a < 1, 
and no = (4a + 2)2"+2 . (log(4a + 2))^ when a > 1. 

Lemma 3.5 When n > hq, P{Bs) > 1 — 
Proof: We first prove that 

{uj'en'' : dT{uj',A) < 2y/{l + 6) Inn} C Bg. 

Recall the definition 

dTiuj',A) = 

SUp{ inf {Er=l P^ ■ + (^1, . . . , e A} : (A, ...,Pn)e M", ^^=1 Pf = l}" 

We will choose the following n vectors for (3 = {f3i, . . . , in the above formula: 

(|W^i,,|,...,|l^„,,|),for£=l,...,n. 

Since W is orthogonal all its row vectors have unit length. For all tu' G fi", = (cji, . . . , cij„) G A, 
and 1 < £ < n, we have: 

\EtlW.Ae^{u;') - e,{u;))\ 

(The last inequality follows from the fact that I{ei{u!') ^ ei(co')) < ^ uj), because 

Ciioj') 7^ Ciioj) implies u' ^ uj.) 

Hence, for all uj' G fi" and 1 < < n, 

inf{| Er=i W,,,{e,{uj') - e,(a;))| : uj e A} 

< inf{Er=i ■ H^' 7^uj)b:ueA} 

= b inf{Er=i ■ /(cu' ^ cu) : a; G A}. 
Therefore, if dxiuj', A) < 2 a/ (1 + 6)\nn then for all 1 < £ < n, 

inf{| Er=i W^^^(e^(^') - e^o;))! : uj e A} < b2^{l + 5) \nn. 
This means that uj' & Bs, and this proves that 

{cj'Gfi" : rfT(c^',^) < 2y/{l + 6) \nn} C 5^. 



Hence, P{Bs) > P{{uj' G : c/t(cu', A) < 2v/(l + 5) Inn }). 

1 

P(A) 



By Talagrand's theorem this is > 1 — exp(— (1 + 6) In 2) ■ pfrv > 1 □ 
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Lemma 3.6 For all oj' e we have: 

(1) When Ji<j < J,0<k<2^ , \dj,k{'^') - 42 1 < l42l < C^M ■ 2-J'(i+°). 

(2) When Jq < ] < Ji, < k < 2^ , \dj,k{^') - d^ll\ < 2A„,5. 



Proof: To prove (1), we note first that by (H3), (D3) we have l^-^fel < C^M2-J(V2+a). 

To prove the inequahty \d-l — dj^k\ < \djl\ one considers six cases, according to the possible 

(f) ~ ~ (f) if) ~ 

relative positions of 0, dj^, and dj^k- If < dj^k < d:-l, or if d^-l < dj^^ < 0, the inequahty is 

obvious from the order picture. The other four cases are not possible, since they would imply 

that \d^f^jf'^^\ > Xn,s, contradicting what we saw a little earlier. This proves (1). 

For the proof of (2) we consider two cases. If dj^k — 0, \d^j^l\ < Xn,5, hence |4ffe — dj^k\ — 
1 42 1 = l4S - 42 1 ^ l4Sl + 1 42 1 ^ ^".-^ + A„,5. in the second case, |45l > A„,5, and 
1 42 ~ ^^'''^ ^ '42 ~ - ^"'^^ + '^^^^ proves the inequality. □ 

Theorem 3.7 (Deviation bound for mctx squeire error) For wavelet shrinkage with 
threshold Xn,s we have for all n >nQ: 

logn\ 1+2" \ 9 



P max(/,-j/,) < {ci + C26){^\ > 1- 



At^^ \-^^ y^/ — \ '- ' ^■^"j \ I I — 14-,? 
o<i<n \ n J I n^^" 

where Ci and Ci depend only on h, M , and a. 

As a consequence (deviation bound for mean square error), 

Proof: At the beginning of subsection 2.1 we defined the function /, and its wavelet coefficients. 
We have 

2-'o-l Ji-12^-1 J-1 2J-1 

7(^) = E 4{V^o,fc(^) + E E 42V..fc(^) + E E 42^i,^(^)' 

A;=0 3=Jo k=0 j=Ji k=0 

and fi = /(^) for 1 < i < n. 

In connection with the thresholding of y we define the function 

2-^0-1 J^_i2i-1 j_i 2^-1 

yi^) = E ^Jolk'PJoA^) + E E ^j:k-ipj,k{x) + E E i^k'ipjA^)- 
k=0 j=Jo k=0 j=Ji k=0 



By Lemma 3.4 we have for all cu' e Bs: 

(0) i4i-4{;,i = |cS'»i < A„,, 
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By Lemma |3.6| we have for all u' E Bs: 

(1) l4fc-42l < 1 42 1 < C^M ■ 2-^^-2+"^ ioT Ji<j < J,0<k<2^ 

(2) l4fc-42l ^ 2A„,5 forJo<J<^i, 0<k<2^. 

Let us first deal with the case of Haar wavelets (when a < 1). For a given j, the supports 
of different Haar wavelets do not overlap. Therefore, for all x G ]0, 1] there exist Ki and K{j) 
such that 

\f{x)-y{x)\ < 

This and (0), (1), (2) imply for all x e ]0, 1]: 

\hx)-y{x)\ < Ci-(i^)^+C2-(^)^ + C3-(i^)^ 

= {c[ + c',VTT6)- (Mzi)T^^ 
Letting x = {1 < i < n) we obtain for all u' G B^: 

\f.-u^')\ = m)-ym < {c[+c',vT+sy i'^)^- 

In the Haar case the theorem follows from this and the fact that P{Bs) > 1 — (when 
n > Uq). 

For wavelets on the interval (when a > 1, and the number of vanishing moments is = [a] ), 
there are never more than 2N wavelets that overlap (for a given j). Indeed, in the above sums 
we have for each j and each x: < 2% — k < 2N — 1. (Other values of k would place the 
argument 2^x — /c of the wavelet functions outside of the support and would ence only produce 
zero-terms in the sums.) Hence k only needs to range from [2-'x] — 2N + 1 through [2-'x], 
which corresponds to 2N values of k. 

Hence, the same calculation as for Haar wavelets applies, except that the constants Ci, C2, 
C3, c[, c'2 need to be multiplied by 2N . □ 

Appendix 

Proof of Lemma 2.1 

Properties (HI) and (H2) follow from a direct calculation based on the exact formulas for the 
Haar wavelets ipj^k and ipj^k- 

(^fl = /o^ e(x)¥?j,fc(x)rfx = 2^/2 j(k+i)2 ^ ^ 

The calculation for (H2) is similar. The same calculation as for (H2) will give for /: 

42 = 2-^-^+^/2 Yto~\fi^ + 1 + k2'-^) -f{i + l + {k + 
Then we use the H51der condition \f{i + l + k2J-^)-f{i + l + {k + \)2-^-^)\ < M(|2^-J)". 
□ 
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Proof of Lemma 2.2 

Property (Dl) follows from a direct calculation: 

n 

where we denote the functions y^^^f* by '^j^2i-2N+k-, and '^"f^^ by ipj^2:>-N+k- 
For the (fjk "in the middle" of the interval we have 

n 

by the the mean-value theorem, for some numbers aijk with \aijk\ < supjQj^] \ip\. 
For the fj,2j-2N+k "at the left end" of the interval, 

n n 

by the mean-value theorem, for some numbers jijs with {■jijsl < sup[Q \ip\. By taking 

we obtain (Dl). At the left end, k< N,so \aijk\ < 2N{2N - 1)^ ■ sup \ip\. 

The scaling functions "at the right end" of the interval are handled in a similar way. The 
calculation for (D2) is similar. (D3) follows from the wavelet characterization of Holder classes 
(§, page 299, and fig). □ 

References 

[1] R. Averkamp, Ch. Houdre, "Wavelet Thresholding for 

Non (Necessarily) Gaussian Noise: Idealism", preprint 

(\protect\vrule widthOpt\protect\href {http : //www . math . gatech . edu\string~houdre/}{htt 

[2] R. Averkamp, Ch. Houdre, "Wavelet Thresholding for 

Non (Necessarily) Gaussian Noise: Functionality", preprint 

(\protect\vrule widthOpt\protect\href {http : //www . math . gatech . edu\string~houdre/}-[htt 

[3] L.D. Brown, M.G. Low, "Superefficiency and lack of adaptability in functional estimation" , 
manuscript. 

[4] A. Cohen, L Daubechies, B. Jawerth, P. Vial, "Multiresolution analysis, wavelets and fast 
algorithms on an interval", Comptes Rendus de I'Academie des Science de Paris, t. 316, 
Serie I (1993) 417-421. 

[5] I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Apphed Mathematics 
(1992). 



16 



[6] D. Donoho, I. Johnstone, "Ideal spatial adaptation by wavelet shrinkage", Biometrika 
81(3) (1994) 425-455. 

[7] D. Donoho, I. Johnstone, G. Kerkyacharian, D. Picard, "Wavelet shrinkage: Asymp- 
topia?". Journal of the Royal Statistics Society series B, 57(2) (1995) 301-369. 

[8] W. Hocffding, "Probability inequalities for sums of bounded random variables" , Journal 

of the American Statistical Association 58 (1965) 13-30. 

[9] O.V. Lepskii, "On one problem of adaptive estimation on white Gaussian noise", Teor. Ve- 
oryatnost. i Premenen. 35 (1990) 459-470 [Russian]. Theory of Probability and Applications 
35 (1990) 454-466 [Enghsh]. 

[10] Y. Meyer, Wavelets and Operators, Cambridge University Press (1992). 

[11] J.M. Steele, Probability Theory and Combinatorial Optimization, Society for Industrial and 
Apphed Mathematics (1997). 

[12] M. Talagrand, "Concentration of measure and isoperimetric inequalities in product spaces" , 
Publications Mathematiques de I'Institut des Hautes Etudes Scientifiques 81 (1995) 73-205. 

[13] B. Vidakovic, Statistical Modeling by Wavelets, Wiley (1999). 



17 



